Learning and Detecting Concept Drift
نویسنده
چکیده
The volume of data that humans create has increased explosively as information science and technology have evolved. Therefore, the demand for learning machines that can extract input-output mappings and knowledge rules from massive data sets has become more urgent, and machine learning is now a core technology in the advanced information society. It has been applied to fields such as pattern recognition, search engines, medical support, robot engineering, image processing, and data mining and has achieved significant accomplishments in each field. Recently, several methods that could not be implemented with older computers have been developed with state-ofthe-art computers that have enormous memory capacity and high performance CPUs. Machine learning is expected to continue to develop in the future. Machine learning can be roughly classified into two types of learning based on how training examples are presented: batch learning and online learning. Batch learning systems are first given a large number of examples that they then learn all at a once. In contrast, online learning systems are given examples sequentially that they learn one by one. Many excellent batch learning systems have been proposed, but there are serious problems with online learning, especially in environments where the statistical properties of the target variable change over time. This change, known as concept drift, can happen either gradually or suddenly and significantly. The effectiveness of strategies for building a good learning system depends on the types of changes, so it is difficult to create an ideal learning system. A prime example of concept drift is the spam filtering problem. An effective spam filter must be able to handle various
منابع مشابه
Detecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملConcept drift detection in business process logs using deep learning
Process mining provides a bridge between process modeling and analysis on the one hand and data mining on the other hand. Process mining aims at discovering, monitoring, and improving real processes by extracting knowledge from event logs. However, as most business processes change over time (e.g. the effects of new legislation, seasonal effects and etc.), traditional process mining techniques ...
متن کاملNew Ensemble Method for Classification of Data Streams
Classification of data streams has become an important area of data mining, as the number of applications facing these challenges increases. In this paper, we propose a new ensemble learning method for data stream classification in presence of concept drift. Our method is capable of detecting changes and adapting to new concepts which appears in the stream. Data stream classification; concept d...
متن کاملConcept Drift Detection Through Resampling
Detecting changes in data-streams is an important part of enhancing learning quality in dynamic environments. We devise a procedure for detecting concept drifts in data-streams that relies on analyzing the empirical loss of learning algorithms. Our method is based on obtaining statistics from the loss distribution by reusing the data multiple times via resampling. We present theoretical guarant...
متن کاملA Simple Unlearning Framework for Online Learning Under Concept Drifts
Real-world online learning applications often face data coming from changing target functions or distributions. Such changes, called the concept drift, degrade the performance of traditional online learning algorithms. Thus, many existing works focus on detecting concept drift based on statistical evidence. Other works use sliding window or similar mechanisms to select the data that closely ref...
متن کاملModeling Concept Drift: A Probabilistic Graphical Model Based Approach
An often used approach for detecting and adapting to concept drift when doing classification is to treat the data as i.i.d. and use changes in classification accuracy as an indication of concept drift. In this paper, we take a different perspective and propose a framework, based on probabilistic graphical models, that explicitly represents concept drift using latent variables. To ensure efficie...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008